Sampling Circulant Matrix Approach: A Comparison of Recent Kernel Matrix Approximation Techniques in Ridge Kernel Regression
نویسنده
چکیده
As part of a survey of state-of-the-art kernel approximation algorithms, we present a new sampling algorithm for circulant matrix construction to perform fast kernel matrix inversion in kernel ridge regression, comparing theoretical and experimental performance of that of multilevel circulant kernel approximation, incomplete Cholesky decomposition, and random features, all recent advances in the literature. In particular, the new circulant approach rivals the remaining three algorithms in accuracy, executing in time complexity of mixed competitiveness, warranting further study. 1 Survey of the Problem: Ridge Regression Ridge regression, also known as Tychonoff regularization, appears as early as [1], though [2] standardized the approach in statistics. Formally, given a D × N data matrix X and corresponding vector of labels y, ridge regression estimates f(x) = y with f̂(x) = wx, where w is a weight vector minimizing ||wX − y|| + ||wΓD||, (1) where Γ is a regularization matrix. The exact solution is ŵ = (XX + ΓDΓ T D) −1Xy. (2) Typically, we choose Γ to be a diagonal matrix with positive entries. Herein, we replace ΓDΓD with λID, where ID is the identity matrix.
منابع مشابه
Robust visual tracking via speedup multiple kernel ridge regression
Most of the tracking methods try to build up feature spaces to represent the appearance of the target. However, limited by the complex structure of the distribution of features, the feature spaces constructed in a linear manner cannot characterize the nonlinear structure well. We propose an appearance model based on kernel ridge regression for visual tracking. Dense sampling is fulfilled around...
متن کاملRandom Fourier Features for Kernel Ridge Regression: Approximation Bounds and Statistical Guarantees
Random Fourier features is one of the most popular techniques for scaling up kernel methods, such as kernel ridge regression. However, despite impressive empirical results, the statistical properties of random Fourier features are still not well understood. In this paper we take steps toward filling this gap. Specifically, we approach random Fourier features from a spectral matrix approximation...
متن کاملDistributed Adaptive Sampling for Kernel Matrix Approximation
Most kernel-based methods, such as kernel or Gaussian process regression, kernel PCA, ICA, or k-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix Kn requires at least O(n2) time and space for n samples. Recent works [1, 9] show that sampling points with replacement according to their ridge leverage scores (RLS) generates small dictionaries of r...
متن کاملProvably Useful Kernel Matrix Approximation in Linear Time
We give the first algorithm for kernel Nyström approximation that runs in linear time in the number of training points and is provably accurate for all kernel matrices, without dependence on regularity or incoherence conditions. The algorithm projects the kernel onto a set of s landmark points sampled by their ridge leverage scores, requiring just O(ns) kernel evaluations and O(ns) additional r...
متن کاملMatrix Approximation for Large-scale Learning
Modern learning problems in computer vision, natural language processing, computational biology, and other areas are often based on large data sets of tens of thousands to millions of training instances. However, several standard learning algorithms, such as kernel-based algorithms, e.g., Support Vector Machines, Kernel Ridge Regression, Kernel PCA, do not easily scale to such orders of magnitu...
متن کامل